Skip to content

Draft: control nonce space and timeouts for all all chip topologies#546

Merged
shufps merged 7 commits intoshufps:developfrom
warioishere:feature/asic-nonce-space
Apr 12, 2026
Merged

Draft: control nonce space and timeouts for all all chip topologies#546
shufps merged 7 commits intoshufps:developfrom
warioishere:feature/asic-nonce-space

Conversation

@warioishere
Copy link
Copy Markdown
Contributor

@warioishere warioishere commented Apr 4, 2026

Summary

Port of Bitaxe ESP-Miner PR 420 - dynamic Hash Counting Number (HCN) calculation for BM1366/BM1368/BM1370 ASICs.

Required for SV2 Standard Channel support where no extranonce is available and the ASIC must search the full nonce space.

Changes

  • Added setNonceSpace(frequency, asic_count, cores) to Asic base class
  • Added getCoreCount() to each ASIC subclass (BM1366=112, BM1368=80, BM1370=128)
  • Replaced setVrFrequency(vrFrequency) with setNonceSpace() in each init()
  • Formula: HCN = (2^32 / next_pow2(cores) / next_pow2(asics)) * FREQ_MULT / freq * 0.5

Open question: Register 0x10 conflict

NerdQAxePlus uses register 0x10 for Version Rolling Frequency (vrFreqToReg), while ESP-Miner uses it for Hash Counting Number. These produce very different values:

Purpose NerdQAxe++ (4x BM1370 @ 615MHz) Register value
VR Frequency (25kHz) VR_REG_PER_HZ / 25000 ~7,864
HCN (PR #420) (2^32/128/4) * 25/615 * 0.5 ~170,993

Need review: Can register 0x10 serve both purposes? Does the HCN calculation implicitly set the correct VR timing? Or do we need both writes?

cc @shufps @mutatrum @adammwest for review of the register 0x10 semantics.

Test plan

  • V1 mining still works with new HCN values
  • Version rolling still functions correctly
  • SV2 Standard Channel can search full nonce space

Replaces static setVrFrequency with computed Hash Counting Number
(HCN) on register 0x10, based on bitaxeorg/ESP-Miner PR#420.

Formula: HCN = (2^32 / next_pow2(cores) / next_pow2(asics)) * FREQ_MULT / freq * 0.5

Core counts: BM1366=112, BM1368=80, BM1370=128

NOTE: NerdQAxePlus previously used register 0x10 for Version Rolling
Frequency (vrFreqToReg). This change replaces that with the HCN value.
The interaction between VR frequency and HCN on register 0x10 needs
review - they produce very different values for the same register.
@shufps
Copy link
Copy Markdown
Owner

shufps commented Apr 5, 2026

Hi,

Need review: Can register 0x10 serve both purposes? Does the HCN calculation implicitly set the correct VR timing? Or do we need both writes?

I experimentally verified that the 0x10 sets the version rolling frequency by logging nonce wrap around times and how fast the version counter advances.

Not sure what Bitaxe thinks this register is for.

And I don't see (yet) why changing the 0x10 might be necessary vor SV2.

The only thing I found out what makes tweaking 0x10 necessary is when the ASIC clock is so high that the search nonce wraps around in the search space before the version counter is incremented, leading to duplicate shares.

Increasing the VR-frequency fixes it then.

I only observed that by ASIC frequencies of hmm around 1100MHz and higher

@warioishere
Copy link
Copy Markdown
Contributor Author

warioishere commented Apr 5, 2026

Thanks for the clarification! That makes sense for V1/Extended Channel where new work arrives every ~500ms via extranonce_2 increment.

The reason we opened this is for SV2 Standard Channel (from our SV2 PR #544):

In Standard Channel, the pool provides the complete block header - the miner has no extranonce to increment. The miner must rely entirely on nonce (32-bit) + version rolling to find shares. New work only arrives when the pool sends a new job, which is typically every 30-60 seconds depending on pool template settings.

I just verified this on hardware: the miner produces duplicate shares within seconds in Standard Channel mode. The current VR frequency setting causes the ASIC to exhaust its search space too quickly.

So the question is: can we adjust register 0x10 to give the ASIC enough nonce + version rolling search space for 30-60 seconds of autonomous mining?

That's also why we tagged @mutatrum and @adammwest - we're not ASIC firmware experts and wanted their input on the register 0x10 semantics.

@warioishere
Copy link
Copy Markdown
Contributor Author

warioishere commented Apr 5, 2026

Some concrete numbers on why this matters for SV2 Standard Channel:

Total search space with full nonce (2^32) + version rolling (0xFFFF = 2^16):

2^48 ≈ 281.5 TH
Device Hashrate Time to exhaust full search space
NerdQAxe+ 2.5 TH/s ~113 seconds
NerdQAxe++ 4.8 TH/s ~59 seconds
NerdOCTAXE 9.0 TH/s ~31 seconds

With pool template intervals of 30-60 seconds, only the NerdQAxe+ would be safe if the full search space is utilized. The NerdQAxe++ and NerdOCTAXE would additionally need ntime rolling to avoid exhausting the search space between templates.

Current problem: The VR frequency at 25 kHz cycles through all 65536 versions in ~2.6 seconds, regardless of whether all nonces per version have been checked. That's why I'm seeing duplicate shares within seconds on hardware.

The register 0x10 value needs to be tuned so that the version rolling rate matches the ASIC's actual nonce scanning speed - ensuring the full 2^48 search space is covered before wrapping around. And for devices above ~2.5 TH/s, ntime rolling support would be needed on top.

@shufps
Copy link
Copy Markdown
Owner

shufps commented Apr 5, 2026

So the question is: can we adjust register 0x10 to give the ASIC enough nonce + version rolling search space for 30-60 seconds of autonomous mining?

IMHO no, I often read something about partitioning and how people believe it's working but I couldn't confirm any of that and I always saw it as some big misunderstanding.

Partitioning works via the chip ID and some people think that distributing the chip ID evenly across the possible 127 IDs, like Bitaxe is doing it, makes it use more search space automagically but this is imho a misconception and I never saw anything that would confirm that it's like that.

This is clearly visible when checking the chip ID of the nonce, it's always the ID that is set during the initialization and not some ID in between of two ASICs.

And the 0x10 needs some balancing between ASIC frequency, version rolling frequency and job interval times.

Just using 0x10 to try to give more time won't work and especially not in 1.5s+ range.

@warioishere
Copy link
Copy Markdown
Contributor Author

@shufps fair points about partitioning - I'm not claiming to understand the ASIC internals better than you do.

But the empirical evidence is clear: on the Bitaxe with the same BM1368/BM1370 ASICs, SV2 Standard Channel works with the register 0x10 change from Bitaxe ESP-Miner PR 420. Without it, duplicate shares within seconds. With it, stable mining. Same ASIC, same pool, same protocol.

That PR was written by @adammwest who has deep knowledge of these ASICs. I'd really value his input here on what register 0x10 actually controls and why the HCN calculation makes Standard Channel work on Bitaxe hardware.

If there's a fundamental difference in how NerdQAxePlus configures the ASICs vs Bitaxe that makes this approach not work here, I'd like to understand that too. Happy to do more testing on hardware to figure this out.

@shufps
Copy link
Copy Markdown
Owner

shufps commented Apr 5, 2026

If there's a fundamental difference in how NerdQAxePlus configures the ASICs vs Bitaxe that makes this approach not work here, I'd like to understand that too. Happy to do more testing on hardware to figure this out.

The main difference is how the chip IDs are set - maybe I'm wrong and everything works completely different than I figured out lol ... Would really surprise me because the picture that I have mentally is very consistent with all I have learned within the last 2 years 😂

@warioishere
Copy link
Copy Markdown
Contributor Author

warioishere commented Apr 5, 2026

@shufps totally respect your 2 years of experience here! And the chip ID difference could well be a factor.

But I can reproduce this directly on my Bitaxe (same BM1368 and 1370 ASIC):

  • Bitaxe firmware WITHOUT PR 420: SV2 Standard Channel → duplicate shares within seconds, same behavior as on NerdQAxePlus right now
  • Bitaxe firmware WITH PR 420: SV2 Standard Channel → no duplicate shares, stable mining, search space is extended

Same device, same pool, same ASIC - just the register 0x10 value changes. So whatever the HCN calculation does to that register, it measurably extends the search space on the Bitaxe.

Would be interesting to test the same register value on NerdQAxePlus and see if the chip ID difference matters or not.

@shufps
Copy link
Copy Markdown
Owner

shufps commented Apr 5, 2026

Would be interesting to test the same register value on NerdQAxePlus and see if the chip ID difference matters or not.

I guess the best test would be to just copy how they set the chip IDs.

Would be a couple of changed lines. I guess any AI could do that in 5 minutes.

But there really is nothing else I could imagine that could make a difference / be different.

@warioishere
Copy link
Copy Markdown
Contributor Author

Good idea 👍 We'll run some tests - both with the Bitaxe-style chip ID setup and with the HCN register value - and report back with the results.

@mutatrum
Copy link
Copy Markdown

mutatrum commented Apr 5, 2026

There might be something in the middle. What surprised me a long time ago is that with Bitaxe PR 420, the nonce space exhausts in the same time no matter the frequency. TBF, this needs to be re-verified as it's from over a year ago, but that would mean Bitaxe also misses a component in the init somewhere. Maybe we both have a partial picture, that both work but is not the complete picture.

- Chip ID distribution: 256 / chip_counter (instead of hardcoded 2/4)
- m_addressInterval member for consistent chip addressing
- Nonce-to-ASIC mapping: (bswap32(nonce) >> 17) / address_interval
- chipIndexFromAddr uses address_interval (removed BM1370 override)
- All per-chip CMD_WRITE_SINGLE use address_interval
- Disabled checkVrFrequencyChanged (overwrites HCN on register 0x10)
@warioishere
Copy link
Copy Markdown
Contributor Author

warioishere commented Apr 5, 2026

@shufps here are our test results - you were right about the chip IDs being the key factor:

Test Results (NerdQAxe++, 4x BM1370 @ 615MHz, SV2 Standard Channel)

Test 1: HCN on register 0x10, original chip IDs (0,4,8,12)

Result: All 4 ASICs hash at full speed (~5 TH/s). V1 shares accepted, no regression. But in Standard Channel: first few unique shares accepted, then duplicates within seconds. Each nonce reported 4x simultaneously - all 4 chips find the same nonce because they search the same nonce space.

Test 2: Same as Test 1 + disabled checkVrFrequencyChanged (was overwriting register 0x10)

Result: Same duplicate pattern. Confirmed the VR frequency overwrite was a problem, but HCN value alone doesn't partition the nonce space between chips.

Test 3: HCN + Bitaxe-style chip IDs (0,64,128,192) + updated nonce-to-ASIC mapping

Result: All 4 ASICs hash, shares accepted, no duplicate shares, stable mining on Standard Channel! Chip ID distribution is the key to nonce partitioning.

Changes needed:

  • address_interval = 256 / chip_counter (instead of hardcoded 2 or 4)
  • All per-chip CMD_WRITE_SINGLE commands use i * address_interval
  • Nonce-to-ASIC mapping: ((bswap32(nonce) >> 17) & 0xff) / address_interval
  • chipIndexFromAddr: addr / address_interval (removed BM1370 override that used addr >> 2)
  • Register 0x10: HCN value from setNonceSpace() instead of VR frequency
  • checkVrFrequencyChanged disabled (was overwriting HCN on register 0x10)
  • HCN recalculated on ASIC frequency change

This also works for multi-chip boards like OCTAXE (8 chips → address_interval=32).

Open questions

VR frequency feature

checkVrFrequencyChanged writes to the same register 0x10 that HCN uses. Currently disabled to prevent overwriting HCN. Should we remove the VR frequency UI feature entirely, or is there a way to combine both?

NerdOCTAXE and NerdQaxe++ needs ntime rolling

The full nonce + version rolling search space at 4.8th/sec or 9 TH/s lasts ~31/59sec seconds. Most pools send new templates every 30-60 seconds. For the NerdQaxe++ and OCTAXE (and future faster devices), we need ntime rolling to avoid exhausting the search space between templates. Our plan: increment ntime every 5 seconds, giving enough headroom for overclocking and future higher-hashrate boards. With 60s template intervals that's max 12 ntime increments - well within consenus tolerance.

Calculates how long the ASIC needs to exhaust the full nonce+version
search space. Used to determine when ntime needs to be incremented
for Standard Channel on multi-chip boards.
@warioishere
Copy link
Copy Markdown
Contributor Author

warioishere commented Apr 5, 2026

Update on the ntime rolling from our last post: instead of a fixed 5 second interval we went with a dynamic approach. We ported calculate_bm_timeout_ms() from ESP-Miner PR 420 which calculates how long the search space actually lasts based on frequency, cores and ASIC count. We roll ntime at 80% exhaustion.

Tested on both NerdQAxe++ and NerdOCTAXE-Gamma with SV2 Standard Channel, no duplicates:

I (59205) mining_info_v2: ntime roll #1: ntime=1775422187 (search space 25.0s exhausted)

NerdQAxe++ rolls at ~46s (80% of ~57s), OCTAXE at ~25s (80% of ~31s). The counter resets on each new pool template so you mostly just see "#1" with 30s template intervals.

This can't go into this PR because it depends on code from both this PR and our SV2 PR (#544). Will follow as a separate PR once both are merged, code is ready on our test branch.

https://github.com/warioishere/ESP-Miner-NerdQAxePlus/tree/test/sv2-nonce-space-v2

VR frequency question still open.

@shufps
Copy link
Copy Markdown
Owner

shufps commented Apr 6, 2026

Hmm interesting ... so it seems the nonce space is really automagically evenly partitioned between the chip IDs - I always thought that wouldn't happen but it seems I was wrong all that time^^

But there is another problem, the dual pool scheduling relies on jobs being switched in short time like 500ms.

Letting a job run for 30s or so somewhat breaks this.

Btw do we need to support Standard?

Pragmatic approach would be just to use Exended -> voila problem solved.

wdyt?

@warioishere
Copy link
Copy Markdown
Contributor Author

On the Dual Pool concern - the chip ID and HCN changes sit on the ASIC driver level but job switching is controlled higher up. V1 and Extended keep sending new work every 500ms regardless of HCN, Dual Pool not affected. Standard Channel has a m_jobSent flag that stops resending, and Dual Pool is already blocked for Standard Channel in the UI anyway.

About whether we need Standard Channels - actually yes, ideally we should support them. The SV2 spec designed Standard Channels for end-mining devices doing Header-Only Mining. They just get a ready Merkle Root and hash, no coinbase/extranonce handling needed. Extended Channels are actually meant for proxies, not end devices. When a miner opens an Extended Channel directly to a pool it's essentially doing the proxy's job - computing coinbase hashes, walking merkle paths, managing extranonce. Works fine but it's a workaround.

The real use case for standard channels: someone running a JDC (Job Declarator Client) for template control. The JDC acts as alocal proxy, opens an Extended Channel upstream and feeds Standard Jobs to downstream miners. Our devices could connect to the JDC via Standard Channel and just hash headers - that's the intended SV2 architecture to also work with a own JD-Client in between to controll your own blocktemplate. The sv2 guys are already doing some great work hier to set this up quiet easily:

https://github.com/stratum-mining/sv2-ui

For connecting directly to a pool without proxy, Extended is the practical choice. But supporting both means the devices works best in both scenarios. And Dual Pool stays disabled for Standard Channel, no conflict there.

@warioishere warioishere changed the title Draft: Dynamic ASIC nonce space calculation (register 0x10) Draft: control nonce space and timeouts for all all chip topologies Apr 6, 2026
@shufps
Copy link
Copy Markdown
Owner

shufps commented Apr 6, 2026

For connecting directly to a pool without proxy, Extended is the practical choice. But supporting both means the devices works best in both scenarios. And Dual Pool stays disabled for Standard Channel, no conflict there.

Hmm interesting, thx for your explanation.

But chances to get it accepted and merged might be a lot lower with Standard Channel support and all the required changes for it 😉

But it’s really weird. It’s like the Standard Channel was invented completely out of touch with reality.

It's so weird that I actually wouldn't like to support it at all, who cares, Extended Channel is supported too and can do exactly what is needed.

@warioishere
Copy link
Copy Markdown
Contributor Author

Just curious - what makes Standard Channel feel out of touch for you? The way we see it, Standard Channel + JDC is kind of the whole point. The miner just hashes headers, the JDC proxy does all the heavy lifting (template construction, extranonce management, merkle computation). The miner firmware stays dead simple and you get template control over your own Bitcoin node. That's the core SV2 decentralization use case.

But hey, if Standard Channel is a blocker for merging we're happy to remove it from the SV2 PR and keep it Extended-only. We can always add it later. The nonce space changes in this PR are independent anyway.

cc @GitGab19 @plebhash curious about your thoughts on Standard Channel for small miners / JDC setups

@shufps
Copy link
Copy Markdown
Owner

shufps commented Apr 6, 2026

Just curious - what makes Standard Channel feel out of touch for you?

Hmm, I think Standard Channel is perfectly fine for very slow devices like NerdMiners. But once you have to abuse ntime as a replacement, hack, or workaround for extranonce2, that feels like a sign that something in the design is off. 😅

It is more or less a coincidence that BM* chips seem to be able to extend their search space beyond their actual chip ID, so this still works in practice. My impression is that other ASICs may not be able to do that as easily, or at all.

A well-thought-out protocol should have taken into account that the available search space can be exhausted very quickly on certain hardware.

So my view is not that Standard Channel is useless — just that it may simply not be the best fit for BM* ASIC miners. 🤷

For that reason I would tend to just ignore it for now and only support Extended Channel what is exactly what we need here.

It would also keep the required SV2 changes less invasive — leaving the core code untouched, avoiding potential regressions, and eliminating any weird special-casing based on which protocol is active.

If there's ever a concrete need to support Standard Channel down the line, future-us can revisit it then.

@warioishere
Copy link
Copy Markdown
Contributor Author

ntime rolling isn't really a hack – the ntime field is part of the block header,
and consensus rules give a wide window for it.

The SV2 spec explicitly mentions that miners may need to roll ntime when the
search space is exhausted, and that the upstream node should send new jobs
frequently enough based on the miner's hashrate.

Behind a JDC proxy, the search space issue mostly goes away anyway. The JDC
doesn't send templates every 30–60 seconds like pools do – it has triggers
based on mempool fee thresholds that push new templates much more frequently.

So the miner gets fresh Standard Jobs often enough that ntime rolling rarely
kicks in. Its jst a safety measure here that if no template arrives from a jd-client, the miner doesnt run out of search space.

I actually don't know whats wrong about utilizing the full potential of the asics instead of just using endless job creations to compensate a wrongly configured asic. That is what I call a "hack".

But yeah, Extended-only for now works for us. I see I cannot really convince you here. I jst want to push SV2 because it has exeptional advantages over SV1 and that its been to long we have been using an outdated protocol. I have tagged plebhash and gitgab19 the lead devs of SRI/SV2, they maybe explain it better then me that Standard Channels are not only usefull for Nerdminers.

I will disable Standard Channels on the SV2 PR for the meantime.

@shufps
Copy link
Copy Markdown
Owner

shufps commented Apr 6, 2026

I actually don't know whats wrong about utilizing the full potential of the asics instead of just using endless job creations to compensate a wrongly configured asic. That is what I call a "hack".

Nothing against the approach itself — but all the surrounding code was written around how the ASICs are currently configured, and just changing that has ripple effects. The best example is probably the dual pool scheduler, which is a deterministic scheduler built on a (short, ~500ms) fixed job interval. Changing that interval could lead to weird pool hashrate statistics or problems regulating pool difficulty.

So it's not that the idea is wrong, it's just that the cost of changing it outweighs the benefit for now. It is what it is, sorry 🤷

@warioishere
Copy link
Copy Markdown
Contributor Author

Just to be clear - the chip ID and HCN changes don't touch the job interval at all. V1 and Extended still send new work every 500ms, the dual pool scheduler runs exactly as before. The only difference is that each chip searches a unique nonce partition instead of all chips searching the same space. No ripple effects on scheduling or pool stats.

The dual pool incompatibility only applies to Standard Channel (which doesn't do 500ms job resends). We already block that combination in the UI - dual pool + standard channel is not selectable.

Anyway, we've disabled Standard Channel on the SV2 PR (#544) for now - both UI hidden and backend forced to Extended. Can revisit later if needed.

@shufps
Copy link
Copy Markdown
Owner

shufps commented Apr 6, 2026

Just to be clear - the chip ID and HCN changes don't touch the job interval at all. V1 and Extended still send new work every 500ms, the dual pool scheduler runs exactly as before. The only difference is that each chip searches a unique nonce partition instead of all chips searching the same space. No ripple effects on scheduling or pool stats.

Yes, that's what I understood from what you wrote — thanks for clarifying anyway. 👍

My arguments were maybe a bit mixed up. The register 0x10 and chip-ID reluctance is probably more pragmatism than a hard technical blocker — it would require updating log output, nonce histogram and similar things (I think even hashrate register reading) that all build on current assumptions and changing it for BM1366, BM1368 and BM1370 because the FW is used on multiple devices.

The main concern was really Standard Channel and dual pool compatibility, which you've already addressed by disabling it for now. So we're good. 🙂

@adammwest
Copy link
Copy Markdown

@warioishere
ah you found an assumption of PR420

PR420 assumes that the address interval is 256/(2**ceil(log2(chain_length)))
reserving the minimum amount of bits for a particular chain length.
I think the functions of PR420 could be abstracted to use address_interval and chips found.

As for the chip interval (as far as my understanding goes)
I think chip address interval reserves nonce space for chips, you used interval=4 in the current code
this means 256/4 = 64 so up to 64 chain length, or reserve 6 bits inside the nonce space/

https://github.com/shufps/ESP-Miner-NerdQAxePlus/blob/develop/components/bm1397/bm1370.cpp#L75

    // set chip address
    for (uint8_t i = 0; i < chip_counter; i++) {
        setChipAddress(i * 4);
    }

but as there are 4 chips in the nerdQAxe++ (I assume chain length 4) so you can claim up to 4 out of 64 reserved

so the nonce range becomes 2^32 / 64 = 2^26 (not in any particular order/bit representation just size)
after reserving, then you claim 4, the result is you get 4 * 2^26 = 2^28 of mineable nonce size or 6.25%

@shufps
for the 0x10 register it is just nonce_percent register that is variable in size
the max value is based on freq of the chip and the address_interval
it is not bounded at 0-100% it can go over 100%, and you get reduced performance.

if you make the register smaller the roll over will be faster because the nonce space is smaller.
make it bigger and the roll over time will be longer.

if you only change freq the roll over time is the same, as the chip frequency changes the max size of 0x10.

The only thing I found out what makes tweaking 0x10 necessary is when the ASIC clock is so high that the search nonce > wraps around in the search space before the version counter is incremented, leading to duplicate shares.
Increasing the VR-frequency fixes it then.
I only observed that by ASIC frequencies of hmm around 1100MHz and higher

This is an interesting observation, according to my understanding the frequency must be bounded as it changes the size of the nonce space proportional to freq, so with 4 chips and a high freq maybe you found an example the limit of the freq. It should have a upper bound.

@adammwest
Copy link
Copy Markdown

bitcoin/bips#2116
seems relavent for the standard channel discussion

==Motivation==
BIP 320 defined 16 bits of nVersion as nonce space for additional nonce space. It turns out that
this isn't enough, as some devices have started using 7 bits from nTime for extra nonce space (see
stratum-mining/sv2-spec#187). Given there's limited utility in 16
bits of nVersion space for signaling, instead here we offer 24 bits of nVersion space as extra
nonce space.

==Rationale==
Headers-only mining avoids mining devices (either ASICs or the firmware) from having to concern
themselves with the vast space of consensus logic (handling transactions, merkle trees, etc). It is
widely deployed in ASICs, but requires a substantial number of jobs fed across an entire device,
keeping the ASIC controller busy. Providing additional nonce space for the ASICs to roll without
needing fresh work from the controller may simplify ASIC design somewhat, and as been apparently
adopted in some miners by using extra space in nTime as extra nonce space. Doing so in nVersion
instead is preferable to using nTime

PR420 assumes address_interval = 256 / next_power_of_two(chip_counter)
to reserve the minimum bits for nonce space partitioning. For our
boards (4, 8 chips) the result is identical but this is correct
for non-power-of-two chain lengths.

Moved next_power_of_two to asic.h as static inline.
@plebhash
Copy link
Copy Markdown

plebhash commented Apr 7, 2026

But it’s really weird. It’s like the Standard Channel was invented completely out of touch with reality.

Even though @shufps already retracted this statement, I'll address it first:

I'm not one of the original Sv2 spec authors, but I know that Standard Channels existed in the Sv2 spec since its original draft.

There's a few arguments for the existence of Standard Channels in Sv2 spec:

  • enable Header-only Mining (HOM) on end-devices while pushing merkle_path+coinbase_tx_prefix+extranonce+coinbase_tx_suffix complexity upstream
  • smaller network bandwidth consumption due to absence of merkle_path+coinbase_tx_prefix+coinbase_tx_suffix on NewMiningJob (when compared to NewExtendedMiningJob and absence of extranonce on SubmitSharesStandard (when compared to SubmitSharesExtended)
  • lighter share validation: validators can check shares against a precomputed job merkle_root instead of rebuilding it from merkle_path+coinbase_tx_prefix+extranonce+coinbase_tx_suffix for every share

Of course, there's always going to be tradeoffs in case an Extended Channel is being split into multiple Standard Channels somewhere along the mining stack. Nevertheless, the arguments above still hold in the general sense, even if weaker in such specific cases.


About Version Rolling:

Please note that while NewExtendedMiningJob has a version_rolling_allowed field, NewMiningJob does not. That's because version rolling is implied to always be mandatory on Standard Jobs.

Please let me know whether this is not clear from the spec, because it should be.

If some implementation skips version rolling on Standard Jobs (or doesn't do it to the full extent), then the search space will become smaller than it could have been, and share duplication will happen before job refresh or ntime is increased.

I have the impression that this is where confusion arised, and I'd be happily open to feedback in case anyone thinks we can make this more explicitly clear in the Sv2 spec.


About hard hashrate ceiling:

Although we aim for Sv2 spec to be a canonical document that's "written in stone", it already had to undergo many adjustments over time. So it's not necessarily perfect as-is.

The aspect that's admittedly still a bit unpolished is the hashrate threshold for Header-only Mining (HOM), because 280TH/s is likely going to become somewhat "obsolete" for industrial-scale mining in the near future.

As @adammwest pointed out above, there's efforts to expand the number of rollable version bits, which should raise this threshold beyond 280TH/s and solve this problem:

The alternative approach to expand Standard Job search space is by rolling ntime (as in actual rolling, not just increasing it after 1s has elapsed).

While theoretically possible (as in consensus valid), if applied at scale this approach could have unintended consequences on network difficulty adjustment and IMO should be discouraged in the community: stratum-mining/sv2-spec#187


cc @GitGab19 @plebhash curious about your thoughts on Standard Channel for small miners / JDC setups

Even though I understand why/how @warioishere arrived to this conclusion, I wouldn't necessarily frame Standard Channels as something that's only benefitial to small miners or JDC use-cases.

The range of legitimate use-cases are broader, and could eventually bring real benefits to the industry (reduction in network-bandwidth and compute) if/when applied at scale.

But yeah, there's a few moving parts with regards to Version Rolling and Sv2 spec polishing, which understandably cause confusion.

@shufps
Copy link
Copy Markdown
Owner

shufps commented Apr 8, 2026

Of course, there's always going to be tradeoffs in case an Extended Channel is being split into multiple Standard Channels somewhere along the mining stack. Nevertheless, the arguments above still hold in the general sense, even if weaker in such specific cases.

Standard Channel is actually something I wanted to have for other things - like some solar powered light-weight LORA miner that shouldn't do any crypto on it's own. The aim always was to just send a header as work load and to just let it mine on that.

There was one puzzle piece left I couldn't answer myself though, it's BM* related. How can I let it mine for longer than 1.5s until it wraps around in the search space.

Sending new headers every 1.5s via LORA is a no-go (not fair use anymore) and sending entier mining.notify is too big (basically same problem).

But the 0x10 register that Adam explained above might be the answer for that. It could make a single ASIC just mine for a couple of minutes when search space has been extended over multiple chip-ID bits in the nonce.

Pleasae don't get me wrong, I don't say Standard Channel is useless - I just have the feeling it might not the best fit for this particular project but might be a game changer for others 😅

And thx a lot of taking your time explaining all of this! 🙌

@mutatrum
Copy link
Copy Markdown

mutatrum commented Apr 8, 2026

The BM1370 can mine for several minutes without a new job, if you extend the full nonce range. The BM1366/BM1368 probably similar? Anything before that will not even get to a second.

@plebhash
Copy link
Copy Markdown

plebhash commented Apr 8, 2026

There was one puzzle piece left I couldn't answer myself though, it's BM* related. How can I let it mine for longer than 1.5s until it wraps around in the search space.

can't you increase ntime after 1s has elapsed?

that should safely reset the search space, and it's one of the main assumptions behind the 280 TH/s ceiling calculation

(sorry if this is a dumb or uninformed question, I haven't really parsed all the details in this discussion!)

@shufps
Copy link
Copy Markdown
Owner

shufps commented Apr 8, 2026

(sorry if this is a dumb or uninformed question, I haven't really parsed all the details in this discussion!)

It's not a dumb question at all! I thought about too ... but ...

Yes would work but that would be some kind of special case that needs different treatment in the code (rolling ntime instead of enonce2) and I'm not convinced that supporting Standard Channel really is worth the effort when Extended just would work 🙊

And I haven't had a deeper look if ntime provides enough to roll for an effective job switching time of ~500ms that would be required to work properly with the dual pool feature.

@adammwest
Copy link
Copy Markdown

The BM1370 can mine for several minutes without a new job, if you extend the full nonce range. The BM1366/BM1368 probably similar? Anything before that will not even get to a second.

small detail
1385/87/97 NO
1398 YES I know someone who did it
1362/66/68/70 YES I checked

Comment thread components/bm1397/asic.cpp Outdated

void Asic::setNonceSpace(float frequency, uint16_t asic_count, uint16_t cores) {
int cores_up = next_power_of_two(cores);
int asic_count_up = next_power_of_two(asic_count);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part that assumes the chip address interval

 int asic_count_up = next_power_of_two(asic_count);

would need to be

 int asic_count_up = 256/address_interval;

That would mean you can have any address interval (in theory)

Copy link
Copy Markdown
Contributor Author

@warioishere warioishere Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed - now using 256/m_addressInterval instead of next_power_of_two(asic_count). Testing this now.

Separate observation: with the nonce space changes we see more hardware duplicate shares (~0.19% vs ~0.03% without). We compared the BM1370 init with Bitaxe early-access and tried removing register 0x68 (not present in Bitaxe) and the extra 0xA4 write after setNonceSpace - didn't help.

Any idea what could cause chips to find the same nonce+version more often with wider address intervals? Happens on both V1 and SV2 Extended and Standard so it's not protocol related.

Copy link
Copy Markdown

@adammwest adammwest Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the 0.19% real
we need thousands of shares to be certain of this,

but
for the gamma there was this possible theorised issue, supposedly the HCN is too big by 268

if you ran with 615Mhz and address_interval is 256/4 = 64
then
100-100* 268/(2^25 * 25/ 615 / 4 * 0.5) = 99.85
0.15 dups
which is close to 0.19%

it should scale worse when hcn_max shrinks
as we have HCN - 268/ HCN_MAX so 268/HCN_MAX = duplicates

if you can test (I dont have the NerdQ device )

address interval  = 2
and hcn = hcn_max
and freq = 615
and hcn_max is made from `int asic_count_up = 256/address_interval;`

expected is
100-100* 268/(2^25 * 25/ 615 / 128 * 0.5) = 94.97
5% dups

In that case i need to update PR420 aswell

for the gamma case the solution would be to do

// HW errata of 134 per half clock cycle 
int hcn = hcn_max-268;

Copy link
Copy Markdown
Contributor Author

@warioishere warioishere Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 0.19% is from two NerdQAxe++ devices running side by side for 2 days, both over 50k shares. One with the nonce space patch, one without. Pretty consistent numbers.

Your math lines up almost perfectly with what we see. We'll test with address_interval=2 and hcn=hcn_max at 615MHz to verify the 5% prediction. If that confirms it we'll add the -268 correction.

Copy link
Copy Markdown
Contributor Author

@warioishere warioishere Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correction: 0.19% was total rejects. Actual hardware duplicates are ~0.16% (0.19%-0.03% from the devices without the patch. Lines up even closer with your 0.15% calculation. Building the address_interval=2 test now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Results with address_interval=2, hcn=hcn_max, freq=615MHz, 11h runtime, ~11100 shares: ~1.87% duplicates (1.90% total minus 0.03% baseline from devices without the patch).

That's about a third of your predicted 5%. The errata offset might be smaller than 268, or it scales differently than expected.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update: the duplicates come in bursts, not evenly distributed. Just jumped from 1.87% to 1.93% after a cluster. Still climbing slowly. Maybe the overlap only triggers under certain timing conditions, not on every nonce wrap.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update: the duplicates come in bursts, not evenly distributed. Just jumped from 1.87% to 1.93% after a cluster. > Still climbing slowly. Maybe the overlap only triggers under certain timing conditions, not on every nonce wrap.

Thats expected for a HCN that is too big, these are 2 distinct types of duplicates wrap around when the space ends and restarts and (i call them internal dups) maybe overlapping range duplicates is a better name

but essentially the chip encodes info (core,chip) in some part of the nonce range, the HCN can overwrite this
what you end up with is a portion of the nonce range is overlapping, so you get solutions that appear very close together in time.

imagine a fictious scenario of a chip with 2 cores and a total nonce range of 256
with 128 spacing and we set 130 for the size per core.

Core Start End Range
Core 0 0 130 0 -> 130
Core 1 128 256 128 -> 256

we cover 100% of the range but both cores are assinged the overlapping range 128->130 so we end up with some dups, that come back at the same time approximately.

Thank you for the test!
I will update PR420 I will use 268 to be safe.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grafik

you are a genius, I can already say from now, this fixed the problem, no more dups at all, even on standard channels!

@warioishere
Copy link
Copy Markdown
Contributor Author

warioishere commented Apr 8, 2026

(sorry if this is a dumb or uninformed question, I haven't really parsed all the details in this discussion!)

It's not a dumb question at all! I thought about too ... but ...

Yes would work but that would be some kind of special case that needs different treatment in the code (rolling ntime instead of enonce2) and I'm not convinced that supporting Standard Channel really is worth the effort when Extended just would work 🙊

And I haven't had a deeper look if ntime provides enough to roll for an effective job switching time of ~500ms that would be required to work properly with the dual pool feature.

I am still not sure if you understood my proposal, Dualpool mode would jst have been not available for Standard channels, when implementing this PR, it still works on Sv1 and Sv2 Extended. Both still use jobtime switching in 500ms interval. Jst standard doesnt use it, we could also had a tooltip explaining this topic to the user.

As adammwest pointed out, the nonce space should be derived from
the actual address_interval (256/interval) rather than
next_power_of_two(asic_count). This correctly handles any chip
address configuration.
@shufps
Copy link
Copy Markdown
Owner

shufps commented Apr 9, 2026

Dualpool mode would jst have been not available for Standard channels, when implementing this PR

why would we add something that needs disabeling another feature when it's actually not really needed?

I guess we could discuss longer about this without ever coming to consensus 😅

No Standard Channel for now but I'll have a look at this PR and "fix" the nonce space issue (well actually adjust everything else that gets broken by changing the chip IDs - if it's not already been fixed by the PR ofc)

But everything that was about "version rolling frequency" can be removed then. In the web UI too because it's not needed anymore then.

It was there for adjusting the "frequency" so that on the QX there are no duplicates.

But Adam seems to be right and everything I did was BS in this case 😅

@warioishere
Copy link
Copy Markdown
Contributor Author

warioishere commented Apr 9, 2026

Dualpool mode would jst have been not available for Standard channels, when implementing this PR

why would we add something that needs disabeling another feature when it's actually not really needed?

I guess we could discuss longer about this without ever coming to consensus 😅

No Standard Channel for now but I'll have a look at this PR and "fix" the nonce space issue (well actually adjust everything else that gets broken by changing the chip IDs - if it's not already been fixed by the PR ofc)

But everything that was about "version rolling frequency" can be removed then. In the web UI too because it's not needed anymore then.

It was there for adjusting the "frequency" so that on the QX there are no duplicates.

But Adam seems to be right and everything I did was BS in this case 😅

Its anyway still work in progress and maybe a proof-of-concept to get the ASIC working as intented and to help understand the ASIC better. No need to hurry on anything here. As you said, and I am fine with that, extended channels is good for now.

@Sjors
Copy link
Copy Markdown

Sjors commented Apr 9, 2026

Yes would work but that would be some kind of special case that needs different treatment in the code (rolling ntime instead of enonce2)

Why "instead"? Wouldn't it make sense to unconditionally bump nTime every second? It keeps the timestamp accurate (that's more of an OCD thing than a real requirement of course).

134 per half clock cycle = 268 nonce overlap between adjacent cores.
Without correction ~0.15% duplicate shares on 4-chip boards.
@warioishere
Copy link
Copy Markdown
Contributor Author

warioishere commented Apr 9, 2026

@Sjors good point - bumping ntime every second unconditionally makes sense. It's not aggressive rolling, just keeping the timestamp accurate. And it gives a natural job refresh point for all modes.

@shufps this would also solve the dual pool concern for Standard Channel - every second the ntime increments, giving you a natural switching point between pools. No 500ms job resend needed, just pick the right pool on each ntime tick. Don't get me wrong, jst discussing :) I wont change anything on the SV2 PR anymore :)

Remove all VR frequency infrastructure (vrFreqToReg, vrRegToFreq,
setVrFrequency, calculateSearchSpaceMs, getDefaultVrFrequency,
NVS storage, HTTP API, Web UI) since the HCN-based nonce space
calculation in setNonceSpace() replaces it correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@shufps
Copy link
Copy Markdown
Owner

shufps commented Apr 10, 2026

Removed everything with version rolling frequency^^

I'll check on the QX with >1100MHz if I see duplicates, if not I would merge this.

edit: interesting, I don't seem to get duplicates at a job time of 10s on an Octaxe and I can confirm that the nonces use the other previously unused chip ID bits in the nonce too, so it really seems the change has extended the search space per chip.

btw, double clicking on the danger zone button makes other edit fields appear like the job interval time (and previously version rolling frequency thing but it was removed now)^^

edit2: Nonce evaluation says all bits in the nonce are now used during mining - this is really nice, love that! 🥰

@warioishere
Copy link
Copy Markdown
Contributor Author

warioishere commented Apr 10, 2026

Before merging - the OCTAXE search space is ~31s at 9 TH/s, ~28s overclocked to 10TH/sec. At your 10s job interval that works, but overclocked devices could still hit duplicates if the job interval exceeds the search space. This affects all modes, not just Standard Channel.

As Sjors suggested we could bump ntime every second unconditionally. That would:

  • make the search space problem go away for all modes and all hashrates
  • potentially allow removing the 500ms job switching entirely
  • but dual pool scheduling would need to be adapted to use ntime ticks instead of job switches

How do you want to proceed - add ntime rolling to this PR and remove job switching, or merge as-is and handle it separately?

@shufps
Copy link
Copy Markdown
Owner

shufps commented Apr 10, 2026

Before merging - the OCTAXE search space is ~31s at 9 TH/s, ~28s overclocked to 10TH/sec. At your 10s job interval that works, but overclocked devices could still hit duplicates if the job interval exceeds the search space. This affects all modes, not just Standard Channel.

The 10s were just a test to confirm that we use more bits of the nonce than before and we don't generate duplicates.

How do you want to proceed - add ntime rolling to this PR and remove job switching, or merge as-is and handle it separately?

no, this PR is only about fixing the nonce search space.

The other is only for SV2 Extended for now.

ntime rolling, we will see.

@shufps
Copy link
Copy Markdown
Owner

shufps commented Apr 11, 2026

nice, I didn't see any duplicates on the QX with >=1100MHz.

Have to test the NQ and NerdAxe too because they use BM1368 and BM1366.

But this looks really nice 🥰

edit: NQ+ works too ✔️

@shufps
Copy link
Copy Markdown
Owner

shufps commented Apr 12, 2026

I guess I'll just merge this and the other one and release a new beta^^

@shufps shufps marked this pull request as ready for review April 12, 2026 06:17
@shufps shufps merged commit 784cae0 into shufps:develop Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants